On Coreset Constructions for the Fuzzy $K$-Means Problem

نویسندگان

  • Johannes Blömer
  • Sascha Brauer
  • Kathrin Bujna
چکیده

In this paper, we present coreset constructions for the fuzzy Kmeans problem. First, we show that one can construct a weak coresets for fuzzy K-means. Second, we show that there are coresets for fuzzy K-means with respect to balanced fuzzy K-means solutions. Third, we use these coresets to develop a randomized approximation algorithm whose runtime is polynomial in the number of the given points and the dimension of these points.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity and Approximation of the Fuzzy K-Means Problem

The fuzzy K-means problem is a generalization of the classical K-means problem to soft clusterings, i.e. clusterings where each points belongs to each cluster to some degree. Although popular in practice, prior to this work the fuzzy K-means problem has not been studied from a complexity theoretic or algorithmic perspective. We show that optimal solutions for fuzzy K-means cannot, in general, b...

متن کامل

StreamKM++: A Clustering Algorithm for Data Streams∗

We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small core...

متن کامل

Scalable and Distributed Clustering via Lightweight Coresets

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of coresets called lightweight cor...

متن کامل

Coresets and approximate clustering for Bregman divergences

We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ R of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ p∈P minc∈C { Dφ(p, c) } is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We g...

متن کامل

A StreamKM++: A Clustering Algorithm for Data Streams

We develop a new k-means clustering algorithm for data streams of points from a Euclidean space. We call this algorithm StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm of Arthur and Vassilvitskii (SODA '07). To compute the small sample, we propose two new techniques. First, we use an adaptive, non-u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.07516  شماره 

صفحات  -

تاریخ انتشار 2016